Pattern Recognition Letters 34 (2013) 275-282 




ELSEVIER 



Contents lists available at SciVerse ScienceDirect 



Pattern Recognition Letters 



journal homepage: www.elsevier.com/locate/patrec 



Pattern Recognition 
Letters 



Mttk 



Change detection based on a support vector data description that treats dependency 

Akram Belghith ab *, Christophe Collet a , Jean Paul Armspach b 



''University of Strasbourg, LS1IT-UMR CNRS 7005, France 
"University of Strasbourg, L1NC-UMR CNRS 7237, France 



ARTICLE INFO 

Article history: 

Received 16 February 2011 

Available online 2 November 2012 

Communicated by S. Sarkar 

Keywords: 

Classification 

SVDD 

Change detection 
Copula theory 



ABSTRACT 



Change detection is of great interest in many applications where image processing is involved; it is an 
essential step of image analysis in various applications, including remote sensing (e.g., assessing changes 
in a forest ecosystems), surveillance (e.g., monitoring ice surface changes to detect glacier flows and 
detecting changes caused by forest defoliation), and medical diagnosis (e.g., monitoring tumor growth). 
This paper aims to classify changes in multi-acquisition data using kernel-based support vector data 
description (SVDD). SVDD is a well-known method that enables one to map the data into a high-dimen- 
sional feature space in which a hypersphere encloses most patterns belonging to the unchanged class. In 
this work, we propose a new kernel function that combines the characteristics of basic kernel functions 
with new information about the feature distribution and the dependencies among samples. The depen- 
dencies among samples are characterized using copula theory; to our knowledge, this is the first time 
copula theory has been used in the SVDD framework. We demonstrate that the proposed kernel function 
is robust and has higher performance compared with classical support vector machine (SVM) and support 
vector data description (SVDD) methods. 

© 2012 Elsevier B.V. All rights reserved. 



1. Introduction 

Detecting changed areas in images is of great interest for di- 
verse disciplines (e.g., remote sensing Walter, 2004; Bazi et al„ 
2005, medical imaging used to quantify tumor growth (Bosc 
et al., 2003; Troglio et al., 2010), driver assistance systems, and 
fall-detection surveillance, among others). Change detection at- 
tempts to identify state differences in a scene where evolution 
phenomena occurred by observing sets of pixels that are signifi- 
cantly different among several acquisitions taken over a specified 
time period. The primary difficulty is distinguishing areas of real 
change from areas of normal and intrinsic variability (the latter 
are called unimportant changes) within the scene. Because of the 
large amount of data that must be analyzed, the change-detection 
task often requires an automated analysis process even if there is 
no universal measure of differences (the measure depends on the 
application (Durucan and Ebrahimi, 2001)) because the data may 
be corrupted by various types of noise (Inglada and Mercier, 2007). 

Several algorithms for change detection have been developed 
over the last few decades. Some must be supervised because of 
the difficulty of the task, whereas others are unsupervised; the dis- 
advantages of the latter approach include a loss of robustness and 
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higher computational time requirements. The first approach relies 
on supervised classification methods to detect changes among sev- 
eral acquisitions (Derrode et al., 1994). This task is equivalent to 
separating the data into two classes: changed and unchanged (or 
insignificantly changed) (the latter will be the class of interest in 
the following text). The process requires a ground truth to derive 
a suitable training set for the learning process of the classifiers. 
However, it is typically difficult and computationally expensive 
to determine the ground truth. Consequently, the use of unsuper- 
vised change-detection methods is crucial in many applications 
where the ground truth is unattainable (Fumera et al., 2000). 

Two very interesting and widely used unsupervised change- 
detection methods are the well-known Bayesian methods (Fumera 
et al., 2000) and the kernel methods (Ben-Hur et al., 2002). 
Although the former methods are relatively simple, the exhibit a 
significant drawback: they require a significant amount of knowl- 
edge about the class of interest. However, such knowledge is not 
always available, particularly in highly complex applications, such 
as medical applications (Sanchez-Hernandez et al., 2007). More- 
over, when only weak changes occurred between the two data sets 
considered, the probability density function (pdf) of the changed 
data may be confused with the pdf of the unchanged data (for 
example, the Hidden Markov Model method typically attempts to 
regularize bad classification results because of this ill-posed prob- 
lem and the presence of outliers in the data (Belghith et al., 2009)). 
Despite these drawbacks, Bayesian methods are efficient tools to 
include a priori through the a posteriori pdf. 
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Furthermore, kernel methods are more flexible. Indeed, kernel- 
based functions have several advantages compared with other ap- 
proaches; they reduce the dimensionality of the data, increase the 
reliability and the robustness of the method in the presence of 
noise and allow flexible mappings between objects (inputs) repre- 
sented by features vectors and class labels (outputs) (Shawe-Tay- 
lor, 2004). Despite having these advantages, the kernel-based 
change-detection method is not time consuming and thus can be 
used in real-time applications. The main disadvantage of kernel 
methods is that one must choose of the kernel function; this choice 
depends strongly on the application (Scholkopf et al., 2000). 

From the different kernel methods presented in the literature 
(e.g., (Furey et al., 2000; Bruzzone et al., 2006)), we have elected 
to use the support vector data description (SVDD) method (Tax 
and Duin, 2004) here. The SVDD classifier method maps the data 
into a high-dimensional feature space. In this new space, a hyper- 
sphere that encloses most of the dataset belonging to the class of 
interest (the target class, which corresponds to the unchanged data) 
and excludes the other observations (that will be considered outli- 
ers) is defined. In this paper, the change-detection problem is trea- 
ted in an unsupervised way. Our aim is to identify patterns that 
belong to the unchanged class without using any ground truth. To 
yield a more effective description of the change-detection problem, 
an outlier class is used for the hyperparameter SVDD estimation. 

Although basic kernel functions can be successfully applied for 
change detection (Enzweiler and Gavrila, 2009; Camps-Valls et al., 
2008), they do not exploit the additional constraints that are often 
available, such as the dependencies and the distribution of differ- 
ent features. We show in this paper that the change detection 
should be more robust, more accurate and more efficient if such 
information is integrated and correctly modeled by the change- 
detection method. To account for these characteristics in our 
change-detection scheme, we propose a new kernel function that 
combines some properties of the old kernel functions with new 
information about the feature distribution and dependencies. The 
challenge is then to determine the appropriate way to treat the 
dependencies. We have opted to use copula theory, which has pre- 
viously been used to effectively treat dependencies (Joe, 1997). In 
particular, we demonstrate that the use of the new kernel function 
increases the performance of the change detection compared with 
the performance when the basic kernel functions are used. The 
proposed method is termed support vector data description with 
dependency handling (SV3DH). 

This paper is divided into two additional sections. In the next sec- 
tion, the proposed change-detection method SV3DH is presented. 
Then, in Section 2, results obtained by applying the proposed 
scheme to synthetic and real data are presented. We emphasize 
the robustness and the efficiency of the novel proposed SV3DH 
approach compared with classical SVM and SVDD approaches. 

3.3. The copula kernel function 

Our aim is to detect changes without using any ground truth 
information. Let {Xj}, =1 K , with x,- e R N , be the vector containing 
the N features of a given object and {r,}, =1 K , with r, g {±1}, the 
corresponding output of {Xj} 1=1 K . Our goal is to blindly classify 
the data into two classes, a class of targets (i.e.; the unchanged 
or r, =+l class) and a class of outliers (i.e.; the changed or 
r f = 1 class), using the SVDD method. In this subsection, we de- 
fine the proposed kernel function. 

3.1.1. The kernel function 

The kernel function enables one to map a data set defined over 
the input space 3 into a higher-dimensional Hilbert space H (the 
feature space). The mapping function is denoted by ip : I — > H. If a 
given algorithm can be expressed in the form of dot products in 



the input space, its non-linear kernel version can be expressed in 
terms of dot products among the mapped samples. Kernel methods 
compute the similarity between samples using pairwise inner 
products taken between mapped samples. Thus, the so-called ker- 
nel matrix, /Cg = /C(X|,Xj) =< cpfa), (Xj) >, contains all the informa- 
tion necessary to perform many classical linear algorithms in the 
feature space. The bottleneck for any method based on kernel func- 
tions is the proper definition of a kernel function that accurately 
reflects the similarity among samples. However, metric distances 
are not all allowed. In fact, valid kernels are only those that fulfill 
Mercer's Theorem (the kernel matrix must be positive semi-defi- 
nite (Minh et al., 2006)). The most common kernels include the 
following: 

• the linear kernel /C(X|,Xj) = Xj.Xj, 

• the polynomial kernel /C(XjXj) = (xi.Xj) d , d > 0, 

• and the Radial Basis Function (RBF), /C(Xi,Xj) = 
exp(-|jXi - Xj\\ 2 /2a 2 ), a > 0. 

Although these kernel functions have been applied successfully 
for change detection, they do not exploit additional constraints 
such as the feature dependency (a measurement that models the 
distance to the independence for two random variables) and thus 
make the change detection less robust. Well-known dependence 
measures include the correlation and covariance matrices. The 
covariance and correlation are related parameters that indicate 
how two random variables co-vary. Consider two images that are 
both realizations of 2D random processes. If the images are af- 
fected by the same linear change, both random fields will tend to 
increase or decrease with the same amplitude. The covariance 
and correlation matrices are useful tools to measure such a ten- 
dency. Unfortunately, the covariance and correlation matrices only 
measure the linear dependency (Bentler and Dudgeon, 1996); they 
are useless when no linear dependency exists (e.g., in remote sens- 
ing images (Inglada and Giros, 2004)). For this reason, we propose 
using copula theory to model non-linear dependency for the mul- 
tivariate case. Indeed, from a statistical point of view, accounting 
for this information in the kernel expression is of primary impor- 
tance for increasing the feature behavior. 

1.1.2. The proposed kernel function 

Our aim is to properly model and integrate both the depen- 
dency and the feature distribution into the kernel function to yield 
a more accurate change-detection result. The new kernel function 
should combine the old kernel function's distance between features 
(we use the RBF function, which offers some degrees of freedom 
because of the hyperparameter a) with new information about 
the features' distance to the independence. We propose a simple 
yet powerful kernel function based on copula theory that enables 
us to model non-linear dependencies among features. 

Several studies demonstrate the effectiveness of the Gaussian 
copula c c for treating dependency (Joe, 1997). Thus, we adopt the 
Gaussian copula, which is given here: Vy = (y it . . . ,y L ) e tt L , 



c c (y) = irpexp 



(i) 



where y = (d> _1 (y, ) , . . . , 4> _1 (y L )) T , in which <£(■) the standard Gauss- 
ian cumulative distribution function, r is the inter-data correlation 
matrix, and 3 is the L x 3. identity matrix. 
The proposed kernel function is 



/C(Xi,Xj) = (£[C c (x,.Xj)]) exp(-|| Xi -Xj|| 2 /2(T 2 ) 



(2) 



where C G (Xi,Xj) = [c c (x,(l),x,(l)), . . .,Cc(X i (fQ,X J (K))], in which K is 
the length of the vector Xj. As the dependency of a couple (Xj, Xj) in- 
creases, the £[C G (Xi,Xj)] value approaches 1. 
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The inter-data correlation matrix r is estimated using the max- 
imum-likelihood estimator (MLE) (Bouye et al., 2000). 

In the next paragraph, we show that the new kernel function 
satisfies Mercer's (Minh et al., 2006). 

3.1.3. Satisfying Mercer's condition 

By Schur's theorem (Horn and Johnson, 2005), the product of 
two valid kernels is also a valid kernel. Because the term 
exp(-||Xj - Xj | | 2 /2cr 2 ) satisfies Mercer's condition, we must only 
demonstrate that the second term (£[C c (Xj,Xj)]) is also a valid ker- 
nel. The latter term can be written as a sum of K functions of the 
form 



c G (Xi(k),Xj{k)) = \Tpexp 



(3) 



where y = (*,(/<)), O^^fe))) 1 . 

In (Horn and Johnson, 2005), the authors prove that the combi- 
nation of valid kernels is a valid kernel. Therefore, to demonstrate 
the validity of the kernel c c (Xi,Xj), it is sufficient to demonstrate 

that both the O 1 function and exp are 
valid kernels. The inverse Gaussian cumulative distribution func- 
tion <J> -1 is equal to \/2er/ _1 (2y - 1), where er/ -1 is the inverse of 
the error function. Because er/ -1 can be arbitrarily closely approx- 
imated by polynomials with positive coefficients, G> _1 is a valid ker- 



nel. We now prove that exp 



(x,-W,XjW) r (r- 1 -/)(x i W,* J (lQ) 



is a valid 



kernel function. This function can be written as the product 



c c (x,(k),Xj{k)) = exp 



(xKk),j&(*)) T (*(*),j*(k)) 



x exp 



( X ,-(k),x J -(k)) T (r- 1 )(x,-(k),x j (/c)) 



The first term is a valid kernel because it is simply the exponen- 
tial of the outer product of (x,(k),Xj(k)) with itself. Because (T _1 ) is 
a symmetric positive semi-definite matrix and the exponential of 
the opposite of a valid kernel is a valid kernel (Horn and Johnson, 
2005), the second term is a valid kernel as well. Therefore, the pro- 
posed kernel satisfies Mercer's condition. 

3.2. The SVDD algorithm 

The proposed scheme is based on two steps: (1 ) an initialization 
step and (2) the SVDD core algorithm. 

3.2.1. Fuzzy K-means initialization 

The first step of the proposed change-detection scheme is to 
identify the two classes, the class of targets and the class of outli- 
ers, that are required to initialize the SVDD classifier. To address 
the gradual transition between the classes, we apply the fuzzy 
K-means method (Duda et al., 2001 ) to extract classes. To estimate 
the membership function that defines the degree of membership 
with which an element belongs to the class of targets, we use an 
S-membership function (Duda et al., 2001). 

Let ll be the estimated membership of an object for the target 
class. At the end of the K-means-based initialization step, we ob- 
tain two hard classes and two fuzzy classes: (1) a hard class of 
the target population, which is defined by ll = 1 and denoted C ht \ 
(2) a fuzzy class of the target population, which is defined by 
0.5 < p. < 1 and denoted C; t ; (3) a fuzzy class of the outlier popula- 
tion, which is defined by 0 < \i C 0.5 and denoted Q„, and (4) a 
hard class of outlier population, which is defined by \i = 0 and 



denoted C h0 . This result will be used for initializing the SVDD 
algorithm. 

3.2.2. The SVDD core algorithm 

The second step describes the target class by exploiting the 
information obtained from the target and outlier sets defined in 
the initialization step. 

Every object is characterized by a features vector. The SVDD en- 
ables us to distinguish between targets and outliers by defining a 
closed boundary around the target data. This is equivalent to draw- 
ing a minimum-volume hypersphere in the kernel feature space 
that includes all or most of the target objects. The sphere is charac- 
terized by its center a and its radius R > 0. Thus, the problem is to 
find a decision function /that assigns the label 1 to each object Xj 
such that /(Xi) C R and label -1 to each object Xi such that 
/(*) > R- 

In the following equations, the target objects (objects that be- 
long to Cf t U C ht ) are enumerated by the indices i and j. Minimizing 
the volume of the sphere reduces to minimizing R 2 with the con- 
straints (Tax and Duin, 2004) 



/C(X|-a,Xj -a) ^R 2 Vi 



(4) 



To allow for the possibility of outliers in the target object set (the 
set of objects that belong to C ft u C/ lt ), the distance from x, to the 
center a should strictly not be smaller than R 2 and larger distances 
should be penalized. Therefore, we introduce slack variables £, > 0, 
and the minimization problem becomes 

Jr 2 

R,a,C, 

with the constraint that almost all objects that belong to the target 
class should be within the sphere, which is given by 



mini ir + Cj2d 



(5) 



/C(Xj - a, X,- - a) s; R 2 + £ f Vi, £i 3= 0 



(6) 



where £, = 0 Vi e C,, t . The parameter C controls the trade-off be- 
tween the volume and the errors. 

This is a classical optimization problem with inequality con- 
straints. Such an optimization problem can be solved by determin- 
ing the saddle point of the Lagrange function (Vapnik, 2000) 
I(R, a, ft. Of, y t ), which is 
L(J?,a,£ i ,a f ,'y i ) 

= R 2 + C^Tci - ][>{r 2 + d - (/C(Xi,Xj) - 2/C(a, Xi ) + fc(a,a)) } - J>,-c,- 

i i i 

(7) 

where a, > 0 and y, > 0 are the Lagrange multipliers. One should 
determine an optimal saddle point (R*, a*, T, «*,}>*) by minimizing 
L with respect to (a, R, Q and maximizing L with respect to non-neg- 
ative (cc,y). A solution in the dual space is determined using the 
standard conditions for an optimum of a constrained function: 

dl 
OR 



= 0, i.e, J]oc* = 1 



(8) 



s =0,«, a 



dL 



0, i.e, a* + }'* = C 



(9) 



(10) 



From the last equation, a* + y\ = C, and because a, > 0, y i > 0, 
the Lagrange multipliers y t can be removed because we demand 
that 0 C (Xj C C. The use of the dual-variable Lagrangian yields 
the following optimization problem: 
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Maximize 
W(x) = y^af/C(Xi,Xi) - SjXiajKfcjXi), 

under constraint : 0 < a; < C 



(11) 



Eq. (9) demonstrates that the center of the sphere is a linear 
combination of the objects. Note that only the support vectors x s 
(the vectors Xj with nonzero coefficients a?) are needed for the 
description. R 2 is the distance from the center of the sphere a to 
any of the support vectors. Support vectors which fall outside the 
description (a, = C) are excluded. Therefore, 



R 2 = /C(x s ,x s ) - 2^a,/C(x s ,Xj) + y^Oi«jX:(Xi,Xj) 



(12) 



To test an object z, the distance to the center of the sphere must 
be calculated. A test object z belongs to the target class when this 
distance is smaller or equal to the radius R: 

/(z) = \K.(z,z) - 2^tx*/C(z,Xi) + 5jx?a;£(Xi,Xj) ) s: R 2 (13) 



When negative examples (objects which should be rejected) are 
available, they can be incorporated in the algorithm to improve the 
description. In contrast with the target examples, which should be 
within the sphere, the negative examples should be outside it. This 
data description differs from the standard Support Vector Classifier 
because the SVDD always obtains a closed boundary around one of 
the classes (the target class). In (Tax and Duin, 2004), the authors 
demonstrated that the joint use of both positive (targets) and neg- 
ative (outliers) examples improves the data description. Thus, we 
adopt the SVDD formulation for the minimum hypersphere. In the 
following equations, the target objects are enumerated by the indi- 
ces i,j and the negative examples (patterns belonging to Cj„ u C h0 ) 
are enumerated by /, m. Again, we allow for errors in both the target 
and outlier sets and introduce slack variables Q ;s 0 and Q > 0. 
Then, the minimization problem becomes (Tax and Duin, 2004) 



(14) 



and the constraints become 
/C(Xi-a),(Xi-a) <R 2 -K(, 
£(X| - a), (x, - a) > R 2 - Q > 0, C, > 0, Vi, I 



(15) 



where Ci and C 2 are two regularization parameters and £, = 0 
Vi 6 C ht , Ci = 0, V/ e C h0 . We must determine the saddle point of 
the primal Lagrangian L(R, a, {,,£[, a,, a ( , y t , }>,), which is 

L(R, a, Cj,£i, a,-, a,, 

= r 2 + Ci^tt + c 2 j2ti - - 

i I i I 

- Y. a '{ R2 + & - (^( x i. x i) - 2/C(a,x) + /C(a,a))} 

i 

- ^pa,{(JC(x,,x,) - 2/C(a,x,) + /C(a,a)) - J? 2 + ?,} (16) 

where % > 0, a ( > 0, y t > 0 and y, > 0 are the Lagrange multipli- 
ers. Setting the partial derivatives of L with respect to R, a, c, and f, 
to zero yields the constraints 



E a i - E a ? = 1 

a* = E a ? Xi ~~ E M ^ X| 
i / 

0<ai<Ci, 0<ai<C 2 , Vi,/ 



(17) 
(18) 
(19) 



The use of the Lagrange multipliers yields the following optimiza- 
tion problem (Tax and Duin, 2004): 
Maximize 

W((Xi,K{) = E a ''^( x i' x 0 ~ E a, ^( X| ' X| ) ~ y"foi«;fc(Xi,Xj) 

- ^a,a m £(x,, x m ) + 2^a / oc,/C(Xj, x,) (20) 

t.m Ij 

subject to 

OsjaisjCt, 0sja,s;C2 Vi,( (21) 

If we define new variables ct! n = r,,a„ (in the following equations, the 
indices n and q enumerate both the target and outlier objects), the 
SVDD that uses negative examples is identical to the normal SVDD. 
The constraints given in Eqs. (17) and (18) become £)n( a n)* = 1 ar| d 
a* = J2n( a nT x i" ar| d the testing function Eq. (13) can again be used. 

Finally, for the case with negative examples, the testing func- 
tion is expressed as 

/(z)= (/C(z,z)-2^( a ;)*/C(z,x n )+^( a '„)*( a ',)*/C(x n .x q ) ) ^R 2 

(22) 

The leave-one-out cross-validation estimator was used to esti- 
mate our model hyperparameters (R,a, ff,Ji) (Cawley and Talbot, 
2003). This algorithm, often cited as being highly attractive for the 
purposes of model selection, provides an almost unbiased estimate. 

2. Experiments 

This section presents a validation of the proposed approach. In 
Section 2.1, the initialization algorithm reliability is discussed. 
Two-class classification and change detection results for real data- 
sets are presented in Sections 2.2 and 2.3, respectively. To demon- 
strate the effectiveness of the proposed kernel function, 
comparisons with other kernel functions are presented throughout 
the present section. Finally, the choice of the Gaussian copula for 
feature dependency handling is discussed in Section 2.4. 

2.1. Initialization algorithm assessment 

To emphasize the benefits of the proposed initialization algo- 
rithm and the use of the fuzzy K-means in particular, three differ- 
ent methods were used to initialize the SV3DH: the proposed 
method, the K-means method and the maximum likelihood meth- 
od. We used artificial toy problems to demonstrate the effects of 
different initialization strategies. For this purpose, we generated 
three artificial toy examples. Samples (x,y),x e R 10 and y e {±1} 
were constructed in the following manner: 

1. Database 2: we set X(=g, + Z( for iel,...,5 and x, =z, for 

!'g6 ,10, where the Zj~A/"(0, 1) are normally distributed 

and g f ~ Q{\ -\-yJ4, 1) obey the gamma distribution, whose 
expression is given by Eq. 23. For both z, and g f , the dependency 
of the samples is 0.5. 



G(g„a,P) = k 



(ot-1) 



R(a 



exp 



(-m<) 



g,>0 



(23) 



where R(oc) — JjJ° f^'exp^dt and a and (S denote the shape and 
inverse scale parameters, respectively. 

2. Database 2: we set Xj=g, + Zi for iel,...,5 and x, =z, for 

!'g6 ,10, where the Zj~A/"(0, 1) are normally distributed 

andg, ~ CG(a, a, fi) (c = 1, [i=y{) obey the generalized Gauss- 
ian distribution, whose expression is given by Eq. 24. For both z, 
and g,, the dependency of the samples is 0.5. 
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Table 1 

The average classification error in % and standard deviation for the results of SV3DH 
applied to synthetic data sets using different initialization algorithms. 



Initialization 


Database 1 


Database 2 


Database 3 


Fuzzy k-means 


4.49 ± 0.45 


4.52 ± 0.68 


3.52 ± 0.48 


k-Means 


5.39 ± 0.88 


5.98 ± 0.82 


4.99 ± 0.57 


ML 


5.12 ± 0.82 


5.69 ± 0.71 


4.74 ± 0.51 


f(g t ; a, a, n) 


_ ?(«)« exD 

[2R(l/«)] 6XP 


;-(»i(a)ia-At|)"] 


(24) 



where 17(a) = [ jg^^ ] 1 , R(a) = / 0 °° f^exp-'dt and li,o, and a de- 
note the mean, the standard deviation and the shape parameter, 
respectively. 

3. Database 3: we set x, = y,/2 + z f for i e 1 , .... 5 and x, = z,- for 
/ g 6, .... 10, where the z, ~ jV(0, 1) are normally distributed. 
For Zj, the dependency of samples is 0.5. 

Thus, all coordinates are noisy and only the first five coordinates 
cany task-relevant information. We drew 5000 examples that 
were split into 50 partitions. The results of validation on the syn- 
thetic databases are summarized in Table 1. The results demon- 
strate that our method had the best performance. Thus, our 
initialization algorithm is well suited for the proposed change- 
detection method. 

2.2. Classification results 

Because our algorithm can be considered a two-class unsuper- 
vised classification algorithm (which uses target and outlier clas- 
ses), we proceed to an assessment of the classification method's 
performance. For this purpose, we used real datasets that were 
introduced in (Ratsch et al., 2001) as a collection of benchmarks 
for the binary classification task. This benchmark collection con- 
tains 1 artificial dataset (banana) and 12 real-world data sets 
(breast cancer, diabetes, german, heart, image segment, ringnorm, 
flare solar, splice, thyroid, titanic, twonorm, waveform). Each data- 
set contains 100 predefined subsets split into training and test 
samples. Each subset consists of a set of patterns with known clas- 
sification results (y e {±1}). Because the classification problem is 
treated in an unsupervised manner, the training and test samples 
were combined into one test portion. Note that all features have 
been normalized to zero mean and unit variation. The selection 
of the appropriate data distributions is addressed using the method 
proposed in (McLachlan and Peel, 2000). 



To emphasize the benefits of the proposed approach for unsu- 
pervised classification, we compared the proposed method to six 
other methods: 

• the SVM method using the proposed copula kernel function 
(SVM with Dependence Handling, which is abbreviated 
SVMDH) and initialized using the fuzzy K-means algorithm, 

• the proposed method using only positive examples (denoted 
SV3DH+) and initialized using the fuzzy K-means algorithm, 

• the proposed method initialized using the K-means algorithm 
(denoted hard-SV3DH), 

• the SVDD method (Tax and Duin, 2004) with the RBF Gaussian 
kernel using positive and negative examples (denoted SVDD) 
and initialized using the fuzzy K-means algorithm, 

• the SVDD method with the RBF Gaussian kernel using only posi- 
tive examples (denoted SVDD+) and initialized using the fuzzy 
K-means algorithm, 

• and the SVM method (Cortes and Vapnik, 1995) with the RBF 
Gaussian kernel and initialized using the fuzzy K-means 
algorithm. 

Table 2 presents the results obtained for the real datasets from 
the benchmark collection. The SV3DH and SVMDH methods had 
similar performance. Thus, the proposed kernel function im- 
proves the feature discrimination for both the standard SVDD 
and SVM methods. However when few labeled data are available, 
purely supervised approaches such as SVMs yield poor solutions 
because they have no information about the changed class. In con- 
trast, the SVDD method yields very good results because the 
method can be used only with the positive samples. In this case, 
it tries to model the unchanged class accurately rather than build- 
ing a separating hyperplane between the changed and unchanged 
classes (Camps-Valls, 2006). For this reason, we use the SVDD 
method. Moreover, when a sufficient amount labeled data is avail- 
able (which is true in our case), Table 2 demonstrates that the 
SV3DH algorithm using positive and negative samples had supe- 
rior classification performance compared with the SV3DH algo- 
rithm that used only positive samples. 

2.3. Change detection results 

To evaluate the performance of the proposed algorithm for change 
detection on real datasets, we considered two multitemporal remote 
sensing image data sets acquired from geographical areas of Alaska 
and Philadelphia that are available from december (201 1 ). The first 
database (the Alaska image) contains a high-resolution (1305 x 1520 
pixels) set of multispectral images collected for a geographical area 



Table 2 

Average classification error in % and standard deviation for real datasets classified using the following methods: the proposed method (SV3DH), the SVM method using the 
proposed copula kernel function (SVMDH), the proposed method using only positive examples (SV3DH+), the proposed method initialized using the k-means algorithm (hard- 
SV3DH), the SVDD method trained using positive and negative examples (SVDD), the SVDD trained using only positive examples (SVDD+) and the SVM method. 

Breast cancer Diabetes Flare-Solar Twonorm Titanic Waveform 



SV3DH 


28.23 ± 


4.35 


24.24 ±1.71 


34.34 ± 1.28 


3.21 ± 0.48 


23.55 


± 


2.03 


10.22 ± 0.93 


SVMDH 


28.31 ± 


4.27 


24.18 ± 1.67 


34.43 ± 1.07 


3.17 ± 0J1 


23.47 


± 


1.89 


10.09 ± 1.02 


SV3DH+ 


29.04 ± 


5.31 


26.01 ± 2.78 


34.89 ± 2.17 


3.97 ± 1.10 


24.89 


± 


3.66 


11.01 ± 1.39 


Hard-SVDD 


29.28 ± 


5.82 


26.39 ± 3.98 


35.29 ± 2.92 


4.32 ± 1.39 


25.21 


± 


3.98 


11.43 ± 1.65 


SVDD 


31.35 ± 


5.89 


29.54 ± 2.52 


35.43 ± 2.36 


5.78 ± 0.68 


27.21 


± 


4.82 


12.96 ± 1.59 


SVDD+ 


31.79 ± 


6.71 


29.92 ± 3.22 


36.04 ± 2.55 


5.90 ± 0.77 


27.48 


± 


4.08 


13.09 ± 1.69 


SVM 


32.41 ± 


5.32 


28.11 ±4.21 


36.14 ± 2.03 


4.69 ± 0.44 


26.43 


± 


3.20 


11.58 ± 3.08 




German 




Heart 


Image 


Ringnorm 


Splice 






Thyroid 


SV3DH 


24.19 ± 


2.25 


5.95 ± 1.84 


4.02 ± 0.67 


2.03 ± 0.25 


11.25 


± 


0.77 


5.98 ± 1.24 


SVMDH 


24.11 ± 


1.89 


5.83 ± 2.01 


3.94 ± 0.78 


2.12 ± 0.31 


11.09 


± 


0.51 


6.10 ± 1.17 


SV3DH+ 


24.52 ± 


2.89 


6.15 ± 2.42 


4.39 ± 1.11 


2.69 ± 0.88 


11.61 


± 


1.06 


6.56 ± 1.90 


Hard-SV3DH 


24.95 ± 


2.91 


6.49 ± 2.62 


4.84 ± 1.48 


3.05 ± 1.28 


11.88 


± 


1.17 


6.69 ± 2.17 


SVDD 


28.35 ± 


1.75 


7.24 ± 3.10 


6.87 ± 0.99 


3.84 ± 0.66 


13.05 


± 


2.29 


8.21 ± 2.54 


SVDD+ 


28.81 ± 


2.03 


7.68 ± 3.14 


6.98 ± 1.55 


4.12 ± 0.87 


13.74 


± 


3.05 


8.47 ± 2.39 


SVM 


27.85 ± 


2.35 


6.21 ± 0.98 


6.41 ± 1.02 


4.74 ± 0.49 


11.78 


± 


3.54 


9.72 ± 3.01 
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of Alaska. These images were acquired by the Landsat-5 Thematic 
Mapper (TM) on July 22, 1985 and July 13, 2005. An area of 
1024 x 1024 pixels is selected for the experiments. The Landsat-5 
TM provides optical images for seven spectral bands, Bands 1 -7. The 
instrument's pixel resolution is 30 m. The ground truth of the change 
detection maps is available from december (201 1 ). 

The second database (the Philadelphia image) contains a high- 
resolution (2000 x 2000 pixels) set of multispectral images col- 
lected for a geographical area of Philadelphia. These images were 
acquired by the Landsat-5 Thematic Mapper (TM) on June 28, 
1988 and the Landsat-7 Enhanced Thematic Mapper (ETM+) on 
September 23, 1999. As for the Landsat-5, the Landsat-7 provides 
optical images for seven spectral bands. An area of 1024 x 1024 
pixels is selected from the Philadelphia image for the experiments. 
The pixel size for all bands is 28.5 m. This includes the Landsat 7 
ETM + thermal band, which has been resampled from its 57 m res- 
olution, and the Landsat 5 TM thermal band, which has been 
resampled from its 114 m resolution. The ground truth for the 
change detection maps is available in (december, 2011). 

For multitemporal change detection, we consider the multi- 
spectral difference image = 12 - II for seven spectral bands. 
Therefore, the high-dimensional information present in the multi- 
spectral difference image is considered to improve the change 
detection accuracy. Fig. 1 (Fig. 2) displays the feature distribution 
of the unchanged class (gray) and changed class (dark) pixels in 
the 2-dimensional l s Alaska image (Philadelphia image) according 
to the available ground truth map. As one can observe from Fig. 2, 
the change detection problem for the Philadelphia image is signif- 
icantly more complex than that for the Alaska image because the 
target and outlier classes significantly overlap. 

To perform the change detection evaluation, we use the False 
Alarm PFA, the Miss Detection PMD and the Total Error PTE mea- 
surements computed as percentages and defined by 

PFA = ^xl 00%; PMD = ^ x 1 00%; PTE = ^, D + ™ x 1 00% 
N F N M N M + N F 

where FA denotes the number of unchanged pixels that were incor- 
rectly identified as changed pixels, N F denotes the total number of 
unchanged pixels, MD denotes the number of changed pixels that 
were mistakenly detected as unchanged ones, and N M denotes the 
total number of changed pixels. 

Table 3 presents the false detections, the missed detections and 
the total errors for both databases. The SV3DH and the SVMDH 
methods have similar results, but the SV3DH method performs 
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Feature 1 

Fig. 1. Distribution of the unchanged class (gray) and the changed class (dark) 
pixels in the 2-dimensional I ddu . Alaska image according to the available ground 
truth. 
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Fig. 2. Distribution of the unchanged class (gray) and the changed class (dark) 
pixels in the 2-dimensional Ijam. Philadelphia image according to the available 
ground truth. 



Table 3 

False detection, missed detection and total errors for the proposed method (SV3DH), 
the SVM method using the proposed copula kernel function (SVMDH), the proposed 
method using only positive examples (SV3DH+), the proposed method initialized 
using the k-means algorithm (hard-SV3DH), the SVDD method using positive and 
negative examples (SVDD), the SVDD method using only positive examples (SVDD+) 
and the SVM method. 



Alaska image (%) False detection (%) Missed detection (%) Total errors (%) 



SV3DH 


0.71 


5.01 


1.09 


SVMDH 


0.70 


4.96 


1.07 


SV3DH+ 


0.78 


5.32 


1.18 


Hard-SV3DH 


0.84 


5.99 


1.29 


SVDD 


1.87 


6.81 


2.01 


SVDD+ 


1.89 


7.03 


2.11 


SVM 


1.04 


6.31 


1.75 


Philadelphia 








image 








SV3DH 


3.82 


14.54 


8.34 


SVMDH 


4.27 


16.07 


9.07 


SV3DH+ 


4.41 


16.84 


9.79 


Hard-SV3DH 


4.87 


16.93 


10.35 


SVDD 


5.09 


17.79 


11.09 


SVDD+ 


6.17 


18.38 


12.21 


SVM 


5.31 


17.91 


11.39 



slightly better on the Philadelphia image. This can be explained 
by the fact that when the changed and unchanged class overlap, 
building a hypersphere is more efficient than building a hyper- 
plane. Moreover, the fuzzy K-means initialization allows us to ob- 
tain better results than when a K-means initialization is used, 
especially in situations when the uncertainty is high (e.g., the Phil- 
adelphia image). Indeed, we obtained 8.34 % total error with the 
fuzzy initialization, whereas the k-means initialization yielded 
10.35% total error. Fig. 3 shows the change detection results from 
different algorithms for the Alaska image. The SVMHD and the 
SVM3DH algorithms yielded similar results. 

2.4. Copula choice assessment 

To emphasize the benefits of the Gaussian copula for feature 
dependency handling, we compared the feature goodness-of-fit 
of the proposed copula with five other copula functions: the r-stu- 
dent copula (Demarta and McNeil, 2005), the Farlie-Gumbel-Mor- 
genstern (FGM) copula (Cossette and Marceau, 2008), the Gumbel 
copula, the Frank copula and the Clayton copula (Rodriguez, 2007). 
For this purpose, we used the copula goodness-of-fit measurement 
approach proposed in (Genest et al., 2008). This approach 
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Fig. 3. Change detection results from the different algorithms for the Alaska image: (a) the ground truth change detection mask; (b) the SVMDH algorithm; (c) the SV3DH 
algorithm; (d) the SV3DH + algorithm; (e) the hard-SV3DH algorithm; (f) the SVM algorithm; (g) the SVDD algorithm and (h) the SVDD + algorithm. 



Table 4 



The L 2 norm for different copula types with the empirical copula. 



copula 


Alaska image 10 3 


Philadelphia image 10 3 


Gaussian 


5.12 


4.02 


t-student 


5.48 


4.19 


FCM 


5.91 


4.61 


Clayton 


6.85 


5.17 


Frank 


5.57 


4.28 


Gumbel 


6.11 


4.97 



measures the discrete L 2 norm between a set of copulas and the 
empirical copula and then selects the copula with the minimum 
difference. We applied this approach to the real datasets. The 
results are presented in Table 4. The Gaussian copula best approx- 
imates the empirical copula. Note that the choice of the copula 
function is still an open problem (Didier et al., 2010). 

3. Conclusions 

In this paper, we have proposed the SV3DH method for change 
detection, which is based on the SVDD method. We have focused 
on the formulation of the change problem as a minimum enclosing 
sphere problem with unchanged samples as target objects. The use 
of the dependency measurement increases the robustness of the 
proposed change-detection scheme compared with the classical 
SVM and SVDD methods. The performance gain depends on the 
quality of the prior samples distribution, which depends on the 
quality of the chosen distributions; consequently, copula theory 
was used. Copula theory provides tools to model the dependency 
of samples even if their distribution is not Gaussian. The experi- 
mental results clearly indicate the benefits of the proposed method. 
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